DATA & CODE FILE OVERVIEW

This repository features the data, code, and results for the paper "Stronger historical contingency facilitates ecological specializations: an example with avian carotenoid networks", which can be found TBD.

Authors: 
Erin S. Morrison, Liberal Studies, New York University (erin.morrison@nyu.edu)
Caitlin M. Hill, Tucson Botanical Gardens
Alexander V. Badyaev, Department of Ecology & Evolutionary Biology, University of Arizona (abadyaev@arizona.edu)

The manuscript examines how the structure of a network of biochemical reaction facilitates the transitions between adaptive states by examining the diversification of carotenoid metabolic networks used to produce pigments in the plumage of birds in relation to transitions in species diets.

This data repository consists of nine data files, five code scripts, and this README document, with the following data and code filenames and variables described below:

ESM is responsible for collecting the data and writing the code.

Analysis Pipeline:
1. SupplementaryCode_S1_ConsensusPhylogeny.txt is a Python code run in Unix to construct a majority rule ultrametric consensus phylogeny (SupplementaryData_S5_UltrametricConsensusTree.nex) from 1,000 trees randomly sampled from the Jetz et al. 2014 dataset on birdtree.org (Supplementary Data S4).
2. SupplementaryData_S5_UltrametricConsensusTree.nex is the majority rule ultrametric consensus tree that was used to calculate ancestral reconstructions for all carotenoid compounds and reactions using the Unix program r8s (see SupplementaryCode_S2_r8s.txt for an example of an ancestral reconstruction for the compound lutein). The ancestral reconstructions were based on the binary states of each compound and reaction at the tips of the phylogeny. This data is located in SupplementaryTable_S3_BinaryNetworks.csv. The results of the ancestral reconstructions are summarized in SupplementaryTable_S4_ancestralreconstruction.csv.
3. SupplementaryCode_S3_IndependentContrasts.R was used to calculate independent constrasts for the number of metabolically-dengenerate compounds in species networks and the modified Shannon diversity index. The raw data for these measures is located in SupplementaryTable_S1_CarotenoidNetworkMeasures.csv and this code also uses SupplementaryData_S5_UltrametricConsensusTree.nex as the phylogeny for the calculation of the independent contrasts. The output of these analyses are located in SupplementaryTable_S2_IndependentContrasts.csv.
4. SupplementaryCode_S4_Phylogeny_Fig2_FigS1_.R was used to make the phylogeny figures for Figure 2 and Figure S1. The phylogeny used in the figures is SupplementaryData_S5_UltrametricConsensusTree.nex and the data displayed in Figure 2 is based on species network data located in SupplementaryTable_S1_CarotenoidNetworkMeasures.csv.
5. SupplementaryCode_S5_Figures3-5.txt is the SAS code for the calculations and statistics displayed in Figures 3-5. The raw data for this code is from SupplementaryTable_S1_CarotenoidNetworkMeasures.csv. The resulting calculations were graphed manually in SigmaPlot 10.


Data Files:

1. SupplementaryTable_S1_CarotenoidNetworkMeasures.csv: Carotenoid network measures and diet categories of species under this study and all reconstructed ancestral networks. It includes the following variables:
SpeciesName	Species scientific name present in the phylogeny (with the exception of Colaptes auratus cafer). Ancestral networks correspond to internal node numbers in majority rule consensus phylogeny (Supplementary Figure 1).
NodeN	Number of compounds in species' network
EdgesN	Number of enzymatic reactions in species' network
DietN	Number of dietary compounds in species network (compounds acquired externally of network)
DegenN	Number of metabolically-degenerate compounds in species network (compounds derived from two or more biochemical pathways of similar lengths starting from more than one dietary compound) *Not identified for ancestral networks
EdgeS	Edge sensitivity (ξ) is the average of the fraction of compounds lost out of the total compounds in the network when a reaction is removed. When ξ = 0, the deletion a reaction has no effect on the production of any of the other compunds. When ξ = 1, the removal of any reaction causes the entire network to disappear. Networks that only contained dietary compounds (had no reactions to lose) were assigned an edge sensitivity score of 0.
NodeS	Node sensitivity (ξ) is the average of the fraction of compounds lost out of the total compounds in the network when a compound is removed. When ξ = 0, the deletion a reaction has no effect on the production of any of the other compunds. When ξ = 1, the removal of any reaction causes the entire network to disappear. 
LutS	Fraction of compounds lost out of the total compounds in a species network when the dietary compound lutein is removed. Species without this dietary compound were assigned a value of 0
ZeaS	Fraction of compounds lost out of the total compounds in a species network when the dietary compound zeaxanthin is removed. Species without this dietary compound were assigned a value of 0
BCarS	Fraction of compounds lost out of the total compounds in a species network when the dietary compound 𝛃-carotene is removed. Species without this dietary compound were assigned a value of 0
BCryptS	Fraction of compounds lost out of the total compounds in a species network when the dietary compound 𝛃-cryptoxanthin is removed. Species without this dietary compound were assigned a value of 0
LutDiam	The shortest distance (number of reactions) between the dietary compound lutein and the most distant metabolically-derived or dengenerate compound in a species network
ZeaDiam	The shortest distance (number of reactions) between the dietary compound zeaxanthin and the most distant metabolically-derived or dengenerate compound in a species network
BcarDiam	The shortest distance (number of reactions) between the dietary compound 𝛃-carotene and the most distant metabolically-derived or dengenerate compound in a species network
BcryptDiam	The shortest distance (number of reactions) between the dietary compound 𝛃-cryptoxanthin and the most distant metabolically-derived or dengenerate compound in a species network
LutScope	Number of metabolically-derived and degenerate compounds in network that are produced via enzymatic reactions from the dietary compound lutein in a species network
ZeaScope	Number of metabolically-derived and degenerate compounds in network that are produced via enzymatic reactions from the dietary compound zeaxanthin in a species network
BcarScope	Number of metabolically-derived and degenerate compounds in network that are produced via enzymatic reactions from the dietary compound 𝛃-carotene in a species network
BcryptScope	Number of metabolically-derived and degenerate compounds in network that are produced via enzymatic reactions from the dietary compound 𝛃-cryptoxanthin in a species network
Shannon	Modified Shannon diversity index (see Zitnik et al. 2019). The extent of connections between all of the compounds in the network. All compounds in the network are connect to each other via enzymatic reactions when Shannon diversity index = 0 and all of the compounds are isolated from each other (not linked via enzymatic reactions) when Shannon diversity index = 1.
CarotMethod	Method of carotenoid identification: HPLC = high-performance liquid chromatography; TLC = thin layer chromatography; HPLC2 = HPLC and TLC combined; HPLC3 = HPLC and mass spectrometry combined; OTH = mass spectrometry or other methods; HPLC_RAM = HPLC and Raman spectroscopy combined
CarotRef	Reference number for carotenoid identification in species (see carotenoid reference list in Supplementary Data S2)
DietClass	Diet classifications: P (Plants & Seeds) = terrestrial and/or aquatic plants, seeds, nuts, woods, or grains; F (Fruit & Nectar) = fruit, flowers, nectar, pollen, sap; I (Invertebrates) = terrestrial and/or aquatic invertebrates including insects, worms, caterpillars; V (Vertebrates. Fish, Scavengers) = non-aquatic and/or aquatic vertebrates including carrion, fish, and shrimp; O (Omnivores) = combination of a least two of the other diet classifications
DietRef	Reference number for diet composition of species (see diet reference list in Supplementary Data S1)
DietComp	Diet compostion of species

2. SupplementaryTable_S2_IndependentContrasts.csv: Results of independent contrasts. It includes the following variables:
AncestralNode	Number of internal ancestral node (from Supplementary Figure S1). Internal nodes created when polytomies were randomly resolved are labeled as 300-326.
DegenN	Independent contrasts for the number of metabolically-degenerate compounds in species networks (compounds derived from two or more biochemical pathways of similar lengths starting from more than one dietary compound)
Shannon	Independent contrasts for the modified Shannon diversity index (see Zitnik et al. 2019). The extent of connections between all of the compounds in the network. All compounds in the network are connect to each other via enzymatic reactions when Shannon diversity index = 0 and all of the compounds are isolated from each other (not linked via enzymatic reactions) when Shannon diversity index = 1.

3. SupplementaryTable_S3_BinaryNetworks.csv: Binary network dataset (presence/absence of compounds and reactions) in species' and ancestral networks.
Species_Scientific_Name	Species scientific name present in the phylogeny. Ancestral networks correspond to internal node numbers in majority rule consensus phylogeny (Supplementary Figure 1).
Node_Age Age in MYA of the ancestral nodes that correspond to the ancestral networks. Extant species are denoted by a node age of 0 MYA.
Variables with digits (1,2,3,4,5,...,110)	These numbers correspond to compounds in the avian carotenoid metabolic network (supplementary figure S2). 1 denotes the presence of the compound and 0 denotes the absence of the compound from the species' or ancestral network.
Variables with digits separated by dashes (1-8, 1-9,...,8-10)	These entries correspond to reactions between compounds in the avian carotenoid metabolic network (supplementary figure S2). These numbers correspond to compounds in the avian carotenoid metabolic network (supplementary figure S2). 1 denotes the presence of a reaction and 0 denotes the absence of a reaction from the species' or ancestral network.
Comments	List of edits made to the ancestral network if the reconstruction was not biologically functional

4. SupplementaryTable_S4_ancestralreconstruction.csv: Parameters for both of the rate models for the ancestral reconstructions calculated using r8s (SupplementaryCode_S2_r8s.txt) of each compounds and reaction in the avian metabolic network.
comp_rxn	compound or reaction in network
model	which model was used for ancestral state reconstruction, either binary-1 (gain/loss rates are equal) or binary-2 (gain/loss rates differ)
k	number of parameters in the model
s	rate of gain (state changes/million years)
r	rate of loss (state changes/million years)
log L	likelihood of the model using joint (conditional) maximum likelihood reconstruction
AIC	Akaike Information Criterion for testing best fit of model, AIC=2k-2logL and the model with the lower score has the best fit
ModelFit	Y denotes this was the model that fits the data better according to AIC	and reconstructions from this model were used to construct ancestral states of the compound or reaction. N denotes model that was not used.

5.	SupplementaryData_S1_DietReferences.txt: The bibliography for diet references (DietRef) listed in Supplementary Table S1.

6.	SupplementaryData_S2_CarotenoidReferences.txt: The bibliography for carotenoids identified in species listed in Supplementary Table S1 (CarotRef)

7.	SupplementaryData_S3_PhylogenyAppendix.nex: Nexus file with the majority rule ultrametric consensus tree used for phylogeny figures and the ancestral reconstructions, and the phylogeny with randomly resolved polytomies used for independent contrasts.

8.	SupplementaryData_S4_JetzPhylogeny.nex: Nexus file of the 1,000 trees randomly sampled from the psuedo-posteror distribution of the Stage2 MayrAll Hackett dataset from birdtree.org. Used to construct an ultrametric 50% majority-rule consensus tree is Supplementary Data S3.

9.	SupplementaryData_S5_UltrametricConsensusTree.nex: Nexus file of the majority rule ulatrametric consensus tree used for Figures 2 and S1 (SupplementaryCode_S4_Phylogeny_Fig2_FigS1_.R), the ancestral reconstructions (SupplementaryCode_S2_r8s.txt), and the independent contrasts (SupplementaryCode_S3_IndependentContrasts.R)

Code Scripts:

1.	SupplementaryCode_S1_ConsensusPhylogeny.txt: Python code run in Unix to construct a majority rule ultrametric consensus phylogeny (Supplementary Data S3) from 1,000 trees randomly sampled from the Jetz et al. 2014 dataset on birdtree.org (Supplementary Data S4).

2.	SupplementaryCode_S2_r8s.txt: An example of the Unix r8s code used for the ancestral reconstruction of lutein (compound 1, see supplementary table S3 for binary network data). The format is the same for all compounds and reactions. The ancestral reconstructions require using SupplementaryData_S5_UltrametricConsensusTree.nex and the output is reported in SupplementaryTable_S4_ancestralreconstruction.csv.

3.	SupplementaryCode_S3_IndependentContrasts.R: The R code used for calculating the independent contrasts using SupplementaryData_S5_UltrametricConsensusTree.nex and the raw data in SupplementaryTable_S1_CarotenoidNetworkMeasures.csv.

4.	SupplementaryCode_S4_Phylogeny_Fig2_FigS1_.R: The R code used to build Figure 2 and Figure S1 in the manuscript, based on SupplementaryData_S5_UltrametricConsensusTree.nex and the raw data in SupplementaryTable_S1_CarotenoidNetworkMeasures.csv

5.	SupplementaryCode_S5_Figures3-5.txt is the SAS code for the calculations and statistics displayed in Figures 3-5. The raw data for this code is from SupplementaryTable_S1_CarotenoidNetworkMeasures.csv. The resulting calculations were graphed manually in SigmaPlot 10. 